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1  Comparison  of  BCM  and  PCA  learning  in  a  realis¬ 
tic  Binocular  Environment 

Different  models,  that  attempt  to  explain  how  cortical  receptive  fields  evolve,  have  been 
proposed  over  the  years  Typically  these  models  are  distinguished  by  their  learning  rule,  the 
representation  of  the  visual  environment,  and  the  architecture  of  the  network. 

Most  of  these  models  assume  a  simplified  representation  of  the  visual  environment  or  a 
second  order  correlation  function  of  the  visual  environment  Realistic  representations  of  the 
visual  environment  have  only  very  recently  been  considered 

Recently  we  have  realistically  modeled  the  two-eye  visual  environment.  We  study  how 
orientation  selectivity  and  ocular  dominance  form  simultaneously.  In  particular,  we  study  / 
the  effect  of  image  misalignment  between  the  two  eyes  on  receptive  field  formation 

We  have  compared  how  image  misalignment  affects  receptive  fields  under  two  two  different 
learning  rules  PCA  in  the  form  proposed  by  Oja  in  1982  and  and  BCM  We  have  chosen  to 
examine  these  two  because  they  are  well  defined  and  and  have  stable  fixed  points. 

We  have  shown  that  binocular  misalignment  has  very  different  effects  on  these  two  learn¬ 
ing  rules.  For  the  BCM  learning  rule  misalignment  is  sufficient  to  produce  varying  degrees 
of  ocular  dominance,  whereas  for  the  PCA  learning  rule  binocular  neurons  will  emerge  inde¬ 
pendent  of  the  misalignment. 

1.1  A  Binocular  Visual  Environment  Composed  of  Natural  Im¬ 
ages 

We  have  used  a  set  of  24  natural  scenes.  These  pictures  were  taken  at  Lincoln  Woods 
State  Park  and  scanned  into  a  256  X  256  pixel  image.  We  have  avoided  man-made  objects, 
because  they  have  many  sharp  edges,  and  straight  lines,  which  make  it  easier  to  achieve 
oriented  receptive  fields. 
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Figure  1:  Three  of  the  natural  images  used  (top)  processed  by  a  Difference  of  Gaussians 
filter  and  presented  at  the  bottom. 

We  have  chosen  to  model  the  effect  of  the  retinal  preprocessing  by  convolving  the  images 
with  a  difference  of  Gaussians  (DOG)  filter,  with  a  center  radius  of  one  pixel  (ai  =  1.0)  and 
a  surround  radius  of  three  {<72  =  3)^.  The  effect  of  this  preprocessing  is  shown  in  figure  1. 

As  illustrated  in  Figure  2,  the  input  vectors  from  both  eyes  are  chosen  as  small,  partially 
overlapping,  circular  regions  of  the  preprocessed  natural  images;  these  converge  on  the  same 
cortical  cell. 


Binocular  Model 


Figure  2:  Schematic  diagram  of  the  two  eye  model,  including  the  visual  input  preprocessing. 

The  input  from  the  right  and  left  eye  respectively  are  denoted  by  d*  and  d"",  and  the 
output  of  the  cortical  neuron  then  becomes  c  =  (7(d^  •  m*  +  d’’  •  m’’),  where  a  is  the  non 
linear  activation  function  of  each  neuron.  We  have  used  a  non-symmetric  activation  function 
to  account  for  the  fact  that  neuronal  activity  as  measured  from  spontaneous  activity  has  a 
longer  way  to  go  up  than  to  go  down  to  zero  activity. 

^This  ratio  between  the  center  and  surround  in  biologically  plausible,  and  enables  the  PCA  rule  to  produce 
oriented  receptive  fields. 


4 


In  order  to  examine  the  effect  of  varying  the  overlap  between  the  receptive  fields  we  define 
an  overlap  parameter  0  —  s/2a,  where  a  is  the  receptive  field  radius  in  pixels,  and  s  is  the 
linear  overlap  in  pixels,  as  shown  in  Figure  2.  When  the  left  and  right  receptive  fields  are 
completely  overlapping  0  =  1,  when  they  are  completely  separate  O  <  0. 

In  order  to  assess  the  degree  of  cell  binocularity,  we  introduced  an  ocular  dominance 
measure  B  based  on  left  and  right  eye  response:  B  =  (L  —  R)f{L  +  R).  B  is  calculated 
by  first  finding  the  orientation  at  which  the  cell  has  the  greatest  binocular  response  to  a 
sinusoidal  grading,  and  then  measuring  L  and  R,  the  left  and  right  eye  responses  at  that 
orientation. 

1.2  Cortical  plasticity  learning  rules 

We  have  employed  these  realistic  visual  inputs  to  test  two  of  the  leading  visual  cortical 
plcisticity  rules  that  have  been  used  to  model  various  normal  rearing  and  visual  deprivation 
experiments:  Principal  components  analysis  (PCA)  and  the  Bienenstock  Cooper  and  Munro 
(BCM)  model. 

Principal  components  analysis  (PCA)  is  one  of  the  most  widely  used  feature  extraction 
methods  for  pattern  recognition  tasks.  PCA  features  are  those  orthogonal  directions  which 
maximize  the  variance  of  the  projected  distribution  of  the  data. 

A  simple  interpretation  of  the  Hebbian  learning  rule,  is  that  with  appropriate  stabilizing 
constraints  it  leads  to  the  extraction  or  approximation  of  principal  components.  This  has 
often  been  modeled  The  learning  rule  that  we  use  has  been  proposed  by  Oja  (1982),  and 
has  the  form:  Am,-  =  ?/[d,c  —  c^m,-]  where  d,-  is  the  presynaptic  activity  at  synapse  i,  c  is 
the  postsynaptic  activity,  and  rrii  is  the  strength  of  the  synaptic  efficacy  of  junction  i.  t), 
is  a  small  learning  rate.  This  learning  rule  has  been  shown  to  converge  to  the  principal 
component  of  the  data. 

The  BCM  theory  has  been  introduced  to  account  for  the  striking  dependence  of  the  sharp¬ 
ness  of  orientation  selectivity  on  the  visual  environment.  We  use  a  variation  due  to  Intrator 
and  Cooper  (1992)  for  a  nonlinear  neuron  with  a  non-symmetric  sigmoidal  transfer  function. 
Using  the  above  notation,  the  synaptic  modification  is  governed  by  rhj  =  0M)dj,  where 
the  neuronal  activity  is  given  by  c  =  cr(m  •  d),  <f>{c,0M)  =  c{c  -  Om),  and  0m  is  a  nonlinear 
function  of  some  time  averaged  measure  of  cell  activity,  which  in  its  simplest  form  is  given 
hy  0M  =  where  E  denotes  the  expectation  over  the  visual  environment.  The  transfer 

function  a  is  non  symmetric  around  0,  to  account  for  the  fact  that  cortical  neurons  show  a 
low  spontaneous  activity.  The  neuron  can  thus  fire  at  a  much  higher  rate  relative  to  the  the 
spontaneous  rate,  but  can  go  only  slightly  below  the  spontaneous  rate. 

1.3  Results 

In  all  the  results  reported  here  we  used  a  fixed  circular  receptive  field  with  a  diameter  of  20 
pixels.  We  tested  the  robustness  of  the  results  to  receptive  fields  of  sizes  10  to  30  pixels  and 
got  no  qualitative  difference  in  the  results. 
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BCM  neurons  acquire  selectivity  to  various  orientations  in  the  partial  and  the  non¬ 
overlapping  case  as  well.  When  receptive  fields  are  misaligned,  various  ocular  dominance 
preferences  may  occur  even  for  the  same  overlap.  This  result  stands  in  sharp  contrast  to  the 
one  obtained  by  PCA  neurons;  only  binocular  neurons  with  a  preferred  horizontal  direction 
emerge  under  for  the  PCA  rule. 


Overlap 


o 

(0) 

GO 
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Figure  3:  BCM  neurons  with  different  overlap  values;  O  =  1, 0.6, 0.2, —0.2  from  top  to 
bottom.  The  ocular  dominance  histograms  summarize  the  ocular  dominance  of  100  cells  at 
each  overlap  value.  The  dependence  of  ocular  dominance  on  visual  overlap  is  evident. 

The  BCM  receptive  field  formation  results  are  summarized  in  Figure  3.  Receptive  field 
misalignment  does  not  affect  orientation  selectivity  of  the  dominant  eye,  but  does  produce 
varying  degrees  of  ocular  dominance;  this  depends  on  the  degree  of  overlap  between  the 
receptive  fields.  The  main  result  is  that  ocular  dominance  depends  strongly  (even  for  single 
cell  simulations)  on  the  degree  of  overlap  between  visual  input  to  the  two  eyes. 

The  PCA  results  are  presented  in  Figure  4.  As  mentioned  above,  it  can  be  seen  that 
the  degree  of  overlap  between  receptive  fields  does  not  alter  the  optimal  orientation,  so  that 
whenever  a  cell  is  selective  its  orientation  is  in  the  horizontal  direction.  The  degree  of  overlap 
does  affect  the  shape  of  the  receptive  fields,  and  the  degree  of  orientation  selectivity  that 
emerges  under  PCA:  orientation  selectivity  decreases  as  the  amount  of  overlap  decreases. 
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Figure  4:  Receptive  fields  for  partially  overlapping  inputs  using  the  PCA  rule,  Receptive  field 
for  an  overlap  value  of  O  =  .6  (top  left).  Receptive  field  for  a  small  overlap,  O  =  .2  (top 
right).  Receptive  field  for  no  overlap  ,  0  =  —.2  (bottom  left).  Receptive  field  for  shift  in 
the  vertical  direction  between  the  visual  inputs  when  0  =  .5  (bottom  right).  In  all  cases  the 
cell  is  binocular  and  horizontal.  The  symmetry  property  evident  in  these  receptive  fields  is 
analyzed  in  Shouval  et.  al.  (1995). 

However,  when  there  is  no  overlap  at  all,  one  again  gets  greater  selectivity.  For  PCA,  there 
is  also  a  symmetry  between  the  receptive  fields  of  both  eyes.  This  arises  from  invariance  to 
a  parity  transformation  that  imposes  binocularity. 

We  also  studied  the  possibility  that  under  the  PCA  rule,  different  orientation  selective 
cells  would  emerge  if  the  misalignment  between  the  two  eyes  was  in  the  vertical  direction, 
but  this  produced  horizontal  binocular  cells  as  well. 

The  PCA  results  described  above  were  quite  robust  to  the  introduction  of  nonlinearity 
in  cell’s  activity;  there  was  no  qualitative  difference  in  the  results  when  a  non  symmetric 
sigmoidal  transfer  function  was  used. 

Thus  we  conclude  that  in  a  realistic  visual  environment  the  BCM  neuron  develops  orienta¬ 
tion  selective  cells  to  all  orientations,  as  well  as  varying  ocular  dominance.  This  is  consistent 
with  observation.  In  contrast  the  PCA  neuron  is  unable  to  develop  cells  selective  to  all 
orientations  and  the  cells  are  always  binocular,  which  is  not  in  agreement  with  observation. 
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Introduction 

Modification  of  synaptic  effectiveness  between 
neurons  in  cortex  is  widely  believed  to  be  the 
physiological  basis  of  learning  and  memory;  fur¬ 
ther,  there  is  now  evidence  that  similar  synaptic 
plasticity  occurs  in  many  areas  of  mammalian 
cortex  (Kirkwood  et  al..  1992).  In  1982,  Bienen- 
stock.  Cooper  and  Munro  (BCM)  proposed  a 
concrete  synaptic  modification  hypothesis  in  which 
two  regions  of  modification  (Hebbian  and  anti- 
Hebbian)  were  stabilized  by  the  addition  of  a 
sliding  modification  threshold. 

There  are  two  ways  to  test  a  theory  like  that  of 
Bienenstock,  Cooper  and  Munro.  One  is  to  com¬ 
pare  its  consequences  with  experiment;  the  other 
is  to  directly  verify  its  underlying  assumptions. 
Recently  two  such  avenues  of  research  have  sup¬ 
ported  this  model  of  plasticity.  Physiological  ex¬ 
periments  have  verified  some  of  its  basic  assump¬ 
tions,  while  analysis  and  simulations  have  shown 
that  the  theory  can  explain  existing  experimental 
observations  of  selectivity  and  ocular  dominance 
plasticity  in  kitten  visual  cortex  in  a  wide  variety 
of  visual  environments  and  make  testable  predic¬ 
tions. 

The  BCM  theory  was  originally  created  to  ex¬ 
plain  the  development  of  orientation  selectivity 
and  binocular  response  of  neurons  in  various 
visual  environmenis  in  kitten  striate  cortex,  one 


of  the  most  thoroughly  studied  areas  in  neuro¬ 
science.  The  research  philosophy  of  our  labora¬ 
tory  is  to  keep  our  model  of  the  cortex  as  simple 
as  possible,  and  add  details  after  behavior  and 
consequences"  have  been  thoroughly  understood. 
In  this  paper  we  will  present  a  more  realistic 
representation  of  the  previous  simplified  visual 
environment.  Effects  on  our  previous  findings, 
and  the  additional  ways  the  extension  allows  fur¬ 
ther  comparisons  with  visual  cortex  will  be  ex¬ 
amined. 

Research  in  this  area  began  with  Nass  ^and 
Cooper  (1975)  who  explored  a  model  in  which  the 
modification  of  visual  cortical  synapses  was 
Hebbian;  i.e.  a  change  to  a  synapse  was  based  on 
the  multiplication  of  the  pre-  and  postsynapiic 
activities,  and  stabilization  of  the  synaptic  weights 
was  produced  by  stopping  modification  when  the 
cortical  response  reached  a  specified  maximum 
—  thus  tying  local  modifications  to  the  total  corti¬ 
cal  response.  The  idea  that  the  sign  of  the  modi¬ 
fication  should  be  based  on  whether  the  postsy- 
naptic  response  is  above  or  below  a  threshold  was 
incorporated  by  Cooper  et  al.  (1979)  (see  Fig.  1) 
to  explain  variations  in  selectivity  with  different 
visual  environments.  To  stabilize  the  synapses 
without  having  to  impose  external  constraints  on 
them,  the  threshold  was  allowed,  by  Bienenstock 
Cl  al.  (1982),  to  slide  as  a  non-linear  function  of 
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Abstract  Model  selection  is  based  upon  the  generalization  errors  of  the  models  in  consideration.  To  estimate  the 
generalization  error  of  a  model  from  the  training  data,  the  method  of  cross-validation  and  the  asymptotic  form  of 
the  jackknife  estimator  are  used.  The  average  of  the  predictive  errors  is  used  to  estimate  the  generalization  error. 
This  estimate  is  also  used  as  the  model  selection  criterion.  The  asymptotic  form  of  this  estimate  is  obtained.  Asymp¬ 
totic  model  selection  criterion  is  also  provided  for  the  case  when  the  error  function  is  the  penalized  negative  log- 
likelihood.  In  the  regression  case,  it  also  proves  the  asymptotic  equivalence  of  Moody's  model  selection  criterion 
and  the  cross-validation  method  under  a  condition  on  the  error  function. 

Keywords  Asymptotics,  Cross-validation.  Generalization  error.  Jackknife  estimator,  Kullback-Lcibler  measure 
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1.  INTRODUCTION 

Due  to  the  flexibility  and  capability  of  neural  network 
in  modeling  the  underlying  nonlinear  functional  rela¬ 
tion  or  decision  (Barron  &  Barron,  1988;  Hinton,  1989; 
White,  1989;  Homik,  Stinchcombe,  Sc  White,  1989), 
it  is  popular  to  use  it  in  data  analysis  and  AI  research. 
One  usually  stans  with  a  probability  description  of  the 
process,  and  then  parametrizes  the  probability  rhodel 
by  a  neural  net  function  and  at  the  same  time  introduces 
a  prior  distribution  on  the  weight  of  the  neural  net  and 
other  parameters  in  the  probability  model.  We  shall  re¬ 
fer  to  the  term  model  in  a  general  sense  as  the  param¬ 
etrized  probability  model  including  prior  probability. 

Various  forms  of  error  functions  can  be  used  to  es¬ 
timate  the  parameter  in  a  model.  However,  it  is  more 
important  to  select  the  right  model  with  small  gener¬ 
alization  error  based  on  the  training  data  set.  Model 
selection  is  the  topic  of  this  article.  In  Section  2,  we 
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introduce  the  definition  of  the  generalization  error  and 
use  the  method  of  cross-validation  (Stone,  1974)  to 
estimate  it.  This  estimate  is  unbiased  and  is  used  as  the 
model  selection  criterion.  Ln  Section  3,  an  3symptotic 
form  of  the  jackknife  estimator  (Miller,  1974)  is  pro¬ 
vided  to  reduce  the  computational  costs  incurred  in  the 
cross-validation  method.  In  Section  4,  the  asymptotic 
form  of  the  model  selection  criterion  is  given.  In  Sec¬ 
tion  5,  the  asymptotic  model  selection  criterion  is  pro¬ 
vided  for  the  case  when  the  error  function  is  the  pe¬ 
nalized  negative  log-likelihood  function;  Akaike's 
Information  Criterion  (AIC)  (Akaike.  1973)  and  Moo¬ 
dy  s  extension  (Moody,  1992)  in  the  regression  case 
are  also  discussed.  It  also  shows  that  the  asymptotic 
equivalence  between  Moody’s  model  selection  crite¬ 
rion  and  the  method  of  cross-validation  when  the  dis¬ 
tance  between  the  response  y  and  the  regression  func¬ 
tion  is  measured  by  the  square  of  their  difference. 


2,  GENERALIZATION  ERROR  AND  ITS 
UNBIASED  ESTIMATE 

Tliere  has  been  a  substantial  amount  of  work  in  the 
problem  of  model  selection  (Lindley.  1968;  Mallows, 
1973;  Akaike,  1973;  Stone.  1974;  Atkinson,  1978; 
Schwanz.  1978;  Craven  &  Wahba,  1979;  Zellner. 
1984;  MacKay.  1991;  Moody,  1992).  One  way  to  as¬ 
sess  the  goodness  of  a  model  is  throuph  the  Kullback- 
Leiblcr  measure  (Kullback  Leibicr.  1951  ).  Denote 
the  underlying  conditional  probability  di.stribution  as 
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Abstract 

In  this  paper  we  address  the  question  of  how  interactions  affect  the  formation  and  orga¬ 
nization  of  receptive  fields  in  a  network  composed  of  interacting  neurons  with  Hebbian  type 
learning.  We  show  how  to  partially  decouple  single  cell  effects  from  network  effects,  and  how 
some  phenomenological  models  can  be  seen  as  approximations  to  these  learning  networks.  We 
show  that  the  interaction  affects  the  structure  of  receptive  fields.  We  also  demonstrate  how  the 
organization  of  different  receptive  fields  across  the  cortex  is  influenced  by  the  interaction  term, 
and  that  the  type  of  singularities  depends  on  the  symmetries  of  the  receptive  fields. 
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1  Introduction 

Receptive  fields  in  the  visual  cortex  of  cats  are  dramatically  influenced  by  the  visual  envi¬ 
ronment  (For  a  comprehensive  review  see,  Fregnac  and  Imbert,  1984)  .  In  normally  reared 
animals,  the  population  of  sharply  tuned  neurons  increa.ses  monotonically,  whereas  for  dark 
reared  animals  it  initially  increases,  but  then  almost  disappears  (See,  for  example,  Imbert 
and  Buisseret,  1975).  Ocular  dominance  is  dramatically  influenced  by  such  manipulations 
as  monocular  deprivation  (Wiesel  and  Hubei,  1963)  or  reverse  suture  (Blakemore  and  Van- 
Sluyters,  1974;  Mioche  and  Singer,  1989). 

These  striking  variations  are  generally  believed  to  be  the  result  of  experience  dependent 
synaptic  modification. 

Now  there  is  also  evidence  that  LTP  (long  thought  to  be  a  possible  physiological  sub¬ 
strate  of  memory)  and  LTD  occur  in  a  similar  fashion  in  hippocampus  and  many  areas  of 
mammalian  cortex  (Kirkwood  et  al.,  1993).  It  seems  quite  possible  therefore,  that  LTP  and 
LTD  are  manifestations  of  the  same  phenomena  of  synaptic  change  as  those  assumed  to  be 
taking  place  in  visual  cortex  and  that  all  of  these  involve  similar  modifications  of  synaptic 
efficacy:  the  physiological  basis  of  learning  and  memory. 

Many  different  synaptic  modification  rules  have  been  proposed  over  the  years  -  both  to 
explain  how  cortical  receptive  fields  evolve  and  to  account  for  learning  and  memory  storage 
in  general  (for  example  von  der  Malsburg  1973,  Nass  and  Cooper  1975, Peres  et.  al.  1975, 
Bienenstock  et.  al.  1982  ,Linsker  1986,  Miller  1994a). 

In  this  paper  we  begin  an  attempt  to  distinguish  between  these  rules  -  to  explore  to  what 
extent  they  lead  to  results  in  agreement  or  disagreement  with  experiment.  Although  it  has 
been  stated  that  the  precise  form  of  the  learning  rule  is  not  important  -  that  any  stabilized 
Hebbian  modification  rule  leads  to  more  or  less  the  same  conclusions  -  our  results  show  that 
this  is  not  correct.  Furthermore,  these  statements  are  misleading;  the  details  of  the  learning 
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